Back

European Heart Journal - Digital Health

Oxford University Press (OUP)

Preprints posted in the last 7 days, ranked by how well they match European Heart Journal - Digital Health's content profile, based on 15 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.

1
Does ECG-Based AI Detect Aortic Stenosis Beyond Conventional LVH Criteria? An Analysis of the CLIDAS Database

Shimada, T.; Kodera, S.; Sawano, S.; Guan, J.; Saitoh, W.; Wakasa, S.; Ito, S.; Yanagishita, T.; Hayashi, Y.; Shibata, A.; Ito, A.; Otsuka, K.; Higashikuni, Y.; Okamura, H.; Tsujita, K.; Node, K.; Yamaguchi, O.; Makimoto, H.; Kabutoya, T.; Imai, Y.; Nakayama, M.; Sato, H.; Fujita, H.; Kohro, T.; Matoba, T.; Takeda, N.; Fukuda, D.; Nagai, R.

2026-06-08 cardiovascular medicine 10.64898/2026.06.07.26355087 medRxiv
Top 0.1%
14.3%
Show abstract

Background: Aortic stenosis (AS) is a progressive valvular disease associated with poor prognosis once symptoms develop, yet routine echocardiographic screening is impractical. While artificial intelligence (AI)-based electrocardiogram (ECG) models have shown promise for AS detection, it remains unclear whether they primarily reflect conventional left ventricular hypertrophy (LVH) voltage criteria or capture additional ECG features. Methods and Results: We developed a deep learning model using 244,816 ECGs from 51,713 patients across six academic institutions in Japan (CLIDAS database). AS labels were derived from inpatient Diagnosis Procedure Combination (DPC) codes. The model achieved an area under the receiver operating characteristic curve (AUC) of 0.849 (95% confidence interval 0.832-0.865) in the independent test cohort, with consistent performance across institutions, sex, and age. At a threshold of 0.1, sensitivity was 79.1%, specificity was 73.9%, and negative predictive value (NPV) was 98.0%. Conventional LVH voltage criteria (Sokolow-Lyon AUC 0.706; Cornell AUC 0.692) showed lower performance, and adding them to the AI model conferred no incremental benefit (AUC 0.849 vs. 0.847). Gradient-weighted class activation mapping (Grad-CAM) revealed predominant attention around QRS complexes in limb leads, beyond regions typically assessed in LVH evaluation. Conclusions: This multicenter AI-ECG model demonstrated strong discrimination for AS and captured ECG features beyond conventional LVH voltage criteria. The high NPV supports its use as a rule-out pre-screening tool.

2
AutoClip: AI-Guided TEE Semantic Segmentation for TEER A Proof-of-Concept Study

Chen, M.; Li, X.; Yang, K.; Taramasso, M.

2026-06-06 cardiovascular medicine 10.64898/2026.05.29.26354195 medRxiv
Top 0.1%
13.9%
Show abstract

**Abstract** **Background:** Transcatheter edge-to-edge repair (TEER) is an established treatment for mitral regurgitation but remains highly dependent on operator experience and complex transesophageal echocardiography (TEE)-guided intraprocedural imaging. Artificial intelligence (AI)-based semantic segmentation may improve procedural reproducibility and intraprocedural guidance; however, no TEER-specific segmentation framework has been reported. **Objectives:** To develop and evaluate AutoClip, a clinician-driven AI-guided TEE semantic segmentation model designed for simultaneous delineation of mitral valve anatomy and in-vivo TEER device components. **Methods:** A retrospective proof-of-concept study was conducted using 987 intraprocedural TEE frames derived from 10 video clips in 3 patients undergoing MitraClip G4 implantation. Seven semantic labels, including mitral leaflets and device components, were manually annotated using ITK-SNAP. Following standardized preprocessing and region-of-interest extraction, an Attention U-Net architecture was trained frame-wise on bicommissural and corresponding X-plane TEE views. Model performance was assessed using mean intersection-over-union (IoU) and Dice coefficient on an independent test set. **Results:** The Attention U-Net demonstrated improved sensitivity to small device structures compared with conventional U-Net architectures. Preliminary training performance achieved a mean IoU of approximately 0.93, while independent test performance reached a mean IoU of 0.46 across foreground classes. Qualitative assessment demonstrated feasible simultaneous segmentation of mitral leaflets, clip arms, grippers, and delivery shaft during TEER procedures. **Conclusions:** AutoClip represents a proof-of-concept TEER-specific TEE semantic segmentation framework initiated through a clinician-oriented workflow without formal computer science expertise. Although preliminary accuracy remains modest due to limited sample size, this study establishes a reproducible pathway for future AI-assisted intraprocedural guidance systems and larger multicenter development efforts in structural heart interventions.

3
EXHEART: A Fairness-Aware Explainable Stacked Ensemble for Cardiovascular Disease Classification with Cross-Instrument Disparity Attribution

Biswas, M. A.; Laila, A.

2026-06-05 health informatics 10.64898/2026.06.03.26354879 medRxiv
Top 0.1%
7.0%
Show abstract

Background: Machine learning models trained on population health surveys offer scalable tools for cardiovascular screening, but recurring methodological weaknesses undermine their credibility and equity: data leakage from synthetic oversampling, qualitative rather than quantitative explainability evaluation, and the absence of demographic fairness auditing at the clinical operating threshold. Methods: We present EXHEART, a leakage-free stacked ensemble pipeline trained on BRFSS 2015 (n = 253,680) and validated on BRFSS 2020 (n = 319,795; temporal transport and retrain) and a clinical cardiovascular examination dataset (n = 68,730). The pipeline combines XGBoost, LightGBM, Random Forest, and a multi-layer perceptron as base learners with 5-fold out-of-fold logistic regression stacking and Platt scaling calibration. A quantitative SHAP-LIME consistency framework, based on Kendall-tau rank correlation and Jaccard overlap, accompanies a decision-curve analysis, a subgroup-stratified SHAP interaction analysis, and an intersectional fairness audit (Sex x Age x Income) with threshold-shifting mitigation and a frontier of the fairness-utility trade-off. The framework also adds cross-instrument fairness-disparity attribution, an empirical diagnostic that provides evidence on whether an observed subgroup disparity is more consistent with a measurement-induced or a substantive explanation by re-validating it on a dataset that measures the same clinical construct objectively. On heart disease, this diagnostic associates 89% of the sex TPR gap (95% CI [0.65, 0.99]) with the self-reported survey outcome rather than with a substantive risk difference. Results: On BRFSS 2015, EXHEART achieves AUC-ROC = 0.850, AUPRC = 0.371, Brier score = 0.071, and reduces ECE by 96% (0.256 to 0.011) via Platt scaling. Global SHAP-LIME rank agreement is moderate-to-strong (Kendall-tau = 0.580, Spearman-rho = 0.818) with a substantial top-3 divergence (Jaccard@3 = 0.200), where Stroke flips from SHAP rank 8 to LIME rank 1. The Sex TPR gap is 0.124 at the screening threshold; intersectional Sex x Age disparities reach 0.649 among adequately-powered cells, 5.2x the single-attribute gap. Temporal transport to BRFSS 2020 collapses sensitivity from 0.776 to 0.267, while retraining restores AUC = 0.840 and ECE = 0.012. On clinical examination data, the Sex TPR gap collapses to 0.014; the attribution test indicates this gap is instrument-dependent, consistent with a measurement or outcome-definition explanation rather than a substantive risk difference. Cross-domain SHAP analysis identifies four instrument-independent CVD risk factors and two major portability failures. Conclusions: EXHEART combines three practices that population-scale cardiovascular classifiers usually apply in isolation: leakage-free training with calibrated probabilities, a test of whether the model's explanations are stable, and a fairness audit that examines intersecting subgroups rather than single attributes. Bringing them together proved worthwhile. The intersectional audit revealed disparities that single-attribute auditing missed, and the cross-instrument comparison indicated that much of the sex gap reflects how the outcome is measured in survey data rather than a substantive difference in risk. The temporal transport findings indicate that deployed BRFSS models warrant periodic monitoring and retraining to maintain clinical utility. EXHEART is a retrospective methodological evaluation on public de-identified data; it is not validated for direct clinical decision-making, diagnosis, or treatment recommendation without prospective clinical validation.

4
An AI-assisted feasibility evaluation of three photoplethysmography-derived microvascular reactivity signals in MIMIC-IV-WDB v0.1.0

Landry, T. C.; Kim, Y.

2026-06-06 health informatics 10.64898/2026.06.03.26354863 medRxiv
Top 0.1%
4.3%
Show abstract

Background. Capillary refill time, an examiner-dependent bedside test of distal microvascular perfusion, has become a resuscitation target in septic shock,1,2,3,4 motivating a continuous surrogate computed from the photoplethysmogram (PPG, the optical waveform the pulse oximeter on every ICU patient already records).5,6,7,8 Objective. We attempted three PPG-derived candidate measures on the MIMIC-IV Waveform Database (MIMIC-IV-WDB v0.1.0) and asked, by inspecting randomly drawn examples, whether each captured its intended physiology before any downstream modeling. Methods. MIMIC-IV-WDB v0.1.09 was linked to MIMIC-IV.10 The signals were a cuff-anchored perfusion-index recovery (reactive hyperemia when the cuff shares an arm with the probe), a slow Mayer-wave-band power ratio of the perfusion index (sympathetic vasomotor tone), and a per-beat diastolic exponential decay time constant (a refill-like recovery time). For each signal we drew 10 random examples at a fixed seed and checked them against a checklist fixed in advance. Each was read by the author and, separately, by MedGemma 1.5, a multimodal medical language model run locally. A synthetic test with a known time constant checked the third signal. Results. The cuff-anchored signal showed the expected occlusion-reperfusion shape on 268 of 6,236 evaluable cuff cycles (4.30%) in 15 of 19 patients, consistent with opposite-limb placement of the probe and cuff. The slow-band ratio returned a stable cohort value, but a clear, stationary peak appeared in only4 of 10 random windows. The per-beat fit met its goodness-of-fit threshold in 10 of 10 beats, yet a cardiac-frequency heuristic flagged a possible fit on the heart-rate oscillation in 7 of 10, and in 5 of 17 patients the time constant lay where an exponential is indistinguishable from a straight line. A 0.5Hz high-pass pre-filter implanted its own approximately 318 ms time constant regardless of truth. The language model tracked the human on clear positives but reported the pattern present on every call it returned, never absent. Conclusions. Two of the three candidate signals did not reflect their intended physiology in most examples, and the third was constrained by sensor placement. Inspecting a few random raw inputs against a checklist written in advance is an inexpensive upstream check before downstream inference on PPG-derived microvascular signals.

5
CarotidMamba: Foundation Model-Enabled CTA Phenotyping of Symptomatic Carotid Plaques in a Multi-Center Retrospective Study

Liu, Y.-S.; Dou, X.-W.; Zheng, P.-Y.; Feng, W.; Ma, L.-J.; You, Y.-N.; Shao, G.-W.; Shen, J.-G.; Yu, X.; Qiao, C.; Cheng, Z.-W.; Li, Z.-W.; Su, F.; Zhang, B.-W.; Qu, X.-H.; Jiang, g.

2026-06-05 cardiovascular medicine 10.64898/2026.06.02.26354776 medRxiv
Top 0.1%
4.3%
Show abstract

Background: Treatment decisions for carotid atherosclerotic disease rely primarily on luminal stenosis, although plaque vulnerability and symptomatic status better reflect short-term cerebrovascular risk. A scalable CTA tool for automated phenotyping of symptomatic carotid disease is lacking. Materials & Methods: In this multi-institutional retrospective study, 689 patients (mean age, 67.9 {+/-} 7.7 years; 366 men) from four hospitals were analyzed after screening 705 CTA examinations. 423 patients from one center were used for five-fold development and internal validation, and 266 patients from three centers for independent external validation. CarotidMamba, a deep learning framework combining dual foundation-model encoders with Mamba-based sequence modeling, was developed and benchmarked against clinical, radiomics, clinic-radiomics, CNN, and transformer comparators. Results: In the development cohort, CarotidMamba achieved an AUC of 0.839 (95% CI, 0.799-0.879) and accuracy of 0.825 (95% CI, 0.793-0.857), outperforming the strongest comparator by 0.066 and 0.050, respectively. External validation yielded AUCs of 0.897 (95% CI, 0.835-0.959) in YCH, 0.809 (95% CI, 0.720-0.898) in DCH, and 0.762 (95% CI, 0.649-0.875) in GH-NTC. CarotidMamba showed the lowest Brier score and expected calibration error across cohorts, with calibration slopes near 1.0. Conclusion: CarotidMamba provides an interpretable, clinically oriented, and externally validated CTA framework for phenotyping symptomatic carotid plaques, supporting vulnerability-aware imaging assessment beyond stenosis alone.

6
ECG-derived age deviation predicts cardiovascular diseases across lead configurations and cohorts

Aydogdu, D.; Gaber, F.; Sorooshmehr, A.; Akalin, A.

2026-06-08 cardiovascular medicine 10.64898/2026.06.05.26354974 medRxiv
Top 0.1%
4.3%
Show abstract

Cardiovascular diseases (CVDs) remain the primary global health burden, motivating the search for robust, non-invasive risk biomarkers. We harness a foundation model pretrained on over 10 million recordings, to evaluate ECG-derived age deviation as a cross-cohort biomarker of CVD burden. A predictive model, trained exclusively on healthy subjects, achieved accurate age prediction. Diseased subjects exhibited significant positive age acceleration across multiple categories, with structural and ischemic heart diseases showing the largest effects. External validation in a hospital-based cohort (n=160,493) confirmed that age acceleration independently predicts all-cause mortality, with the strongest prognostic value in patients under 65 years. Furthermore, we demonstrated that disease discrimination and mortality prediction are preserved across 6-lead and single-lead configurations, supporting potential deployment in wearable or mobile devices. Our analysis also revealed a striking morphological confound from the complete left bundle branch block, leading us to propose absolute age deviation as a more robust, universal risk marker. These findings establish ECG-derived biological age deviation as a highly generalizable and clinically actionable biomarker for assessing cardiovascular risk. We have also developed a web application at https://bioinformatics.mdc-berlin.de/ECGage that allows users to easily test our framework.

7
Soft Tissue-to-Bone Ratio on Routine Bone Scintigraphy as an Opportunistic Imaging Biomarker of Cardiovascular-Kidney-Metabolic Burden

Spielvogel, C. P.; Kluge, K.; Ning, J.; Kumpf, K.; Nitsche, C.; Hengstenberg, C.; Slomka, P. J.; Hacker, M.

2026-06-09 cardiovascular medicine 10.64898/2026.06.08.26355179 medRxiv
Top 0.2%
4.0%
Show abstract

Background: Cardiovascular-kidney-metabolic (CKM) syndrome is a leading driver of cardiovascular morbidity and mortality. Whole-body molecular imaging is well-positioned to phenotype such syndromes, yet no imaging biomarker quantifies cumulative CKM burden. Bone scintigraphy with 99mTc-labeled bisphosphonates is widely performed and expanding with transthyretin amyloidosis assessment, under which Perugini grade 0 (absent cardiac uptake) is considered clinically benign. Objective: We hypothesized that the soft tissue-to-bone ratio (STBR) on these scans captures CKM burden and is an independent prognostic biomarker. Methods: We retrospectively analyzed 8,769 consecutive patients without cardiac uptake on 99mTc-DPD whole-body planar scintigraphy. The primary endpoint was all-cause mortality. Secondary endpoints were major adverse cardiovascular events (MACE) and heart failure hospitalization. Cox models were adjusted for ten established cardiovascular risk factors. Imaging-phenotype association (IPA) analysis mapped STBR to 1,210 clinical traits. STBR distribution across CKM stages was assessed in four prespecified analyses, including a non-cancer subgroup. Results: During a median follow-up of 5.1 years (IQR 2.5-8.2), 2,418 deaths occurred. Patients with prespecified STBR >0.5 (n=772, 8.8%) had significantly higher mortality (adjHR 1.73, 95% CI 1.54-1.94, p<0.0001) with an adjHR of up to 3.42 at higher thresholds (95% CI 2.05-5.42, p<0.0001). Hazard increased monotonically with STBR. STBR >0.5 was independently associated with MACE (adjHR 1.51, 95% CI 1.11-2.05, p=0.008) and heart failure hospitalization (adjHR 1.31, 95% CI 1.02-1.67, p=0.03). The association was robust across all prespecified subgroups and sensitivity analyses, including continuous STBR and patients without renal insufficiency. IPA analysis identified significant associations with type 2 diabetes, chronic kidney disease, chronic ischaemic heart disease, heart failure, atrial fibrillation, liver disease, amyloidosis, and hypertension among binary traits, as well as with CRP, NT-proBNP, BUN, cholesterol (inverse), and hemoglobin (inverse) among continuous parameters. STBR increased monotonically across CKM stages in all sensitivity analyses (all p<0.0001). Conclusions: STBR derived from routine 99mTc-DPD bone scintigraphy in patients without cardiac uptake is an independent prognostic imaging biomarker associated with cumulative cardiovascular-kidney-metabolic burden. As an opportunistic measure from scans already acquired at scale, STBR could refine CKM risk stratification at no additional cost, radiation, or acquisition time.

8
Context-Dependent Age-Group performance hierarchies limit fairness interventions in PPG-based heart rate prediction

Panchumarthi, L. Y.; Kataria, S.; Wu, Y.; Hu, X.; Fedorov, A.; Kwak, H. G.

2026-06-05 health informatics 10.64898/2026.06.04.26352929 medRxiv
Top 0.2%
2.8%
Show abstract

Background. Fairness-aware machine learning increasingly targets demographic performance disparities in clinical prediction, yet whether standard bias mitigation strategies genuinely improve equity in physiological signal analysis remains unclear. Age-based disparities in photoplethysmography (PPG)-based heart rate prediction present a particular challenge, as age-related performance differences may reflect context-dependent physiological structure rather than correctable artifacts. Methods. We evaluated three fairness interventions, inverse-frequency weighting (IF), Group Distributionally Robust Optimization (GroupDRO), and adversarial debiasing (ADV), applied via fine-tuning of a PPG foundation model across three clinical datasets spanning intensive care unit, laboratory, and consumer wearable contexts. Outcomes were assessed using a 2x2 framework classifying each intervention-dataset combination by the joint direction of change in mean absolute error (MAE) and fairness gap (FG) across age groups, yielding four outcome types: genuine improvement (G), leveling down (L), selective benefit (S), and both worse (W). Results. Across nine intra-domain conditions, no intervention simultaneously improved both MAE and FG (0/9 genuine improvement). The dominant pattern was leveling down (5/9): FG decreased but was accompanied by MAE degradation, indicating that apparent fairness gains were achieved at the cost of overall predictive performance. Age-group difficulty ordering varied across clinical contexts at baseline and was not preserved under intervention. In 18 cross-domain transfer conditions, genuine improvement was rare (4/18) and observed exclusively in non-MIMIC source configurations; models fine-tuned on MIMIC-sourced data yielded no genuine improvements (0/6). Embedding-level representation changes following fine-tuning did not reliably predict fairness outcomes. Conclusions. Age-based fairness interventions in PPG heart rate prediction indicate a leveling-down pattern rather than genuine equity improvement, suggesting that age-related performance gaps reflect context-dependent physiological structure not fully addressable through standard bias mitigation. Cross-domain transfer further amplifies this instability. These findings suggest that fairness evaluation frameworks for age-stratified physiological prediction should account for context-dependent performance structure rather than treating observed gaps as correctable bias.

9
Resolving Diagnostic Discordance in Group 2 Pulmonary Hypertension Through Staged Physiologic Testing: Insights From PVDOMICS

Rischard, F.; PVCOMICS Study Group, ; Mendoza, M.; Insel, M.; Beck, G.; Erzurum, S.; Frantz, R. P.; Finet, J. E.; Hassoun, P.; Hemnes, A. R.; Hill, N. S.; Horn, E. M.; Leopold, J. A.; Mathai, S. C.; Mehra, R.; Reddy, Y. N. V.; Rosenzweig, E. B.; Systrom, D. M.; Tang, W. H. W.; Waxman, A.; Borlaug, B. A.

2026-06-10 cardiovascular medicine 10.64898/2026.06.04.26354961 medRxiv
Top 0.2%
2.4%
Show abstract

Background World Symposium on Pulmonary Hypertension (WSPH) Group 2 pulmonary hypertension (PH) is a clinically integrated phenotype attributed to left heart disease, whereas pre- versus post-capillary classification is operationalized primarily by pulmonary capillary wedge pressure (PCWP). Although current recommendations emphasize contextual interpretation and provocative testing for intermediate PCWP values, the relationship between PCWP-based classification and underlying phenotype has not been systematically evaluated. We aim to quantify phenotype-hemodynamic discordance across the PCWP spectrum and evaluate a staged physiology-guided framework incorporating inhaled nitric oxide (iNO), ventricular geometry, and provocative testing. Methods We studied 1,032 participants from the NHLBI-sponsored PVDOMICS cohort with multidisciplinary adjudicated phenotypes integrating clinical, imaging, physiologic, and hemodynamic data. Stage-specific PCWP thresholds classified pre- versus post-capillary physiology at rest, during iNO, and during provocation (fluid challenge or invasive cardiopulmonary exercise testing [iCPET]). Echocardiographic right ventricular-to-left ventricular (RV/LV) ratio was evaluated as a marker of ventricular interdependence. Restricted cubic spline and staged concordance analyses defined certainty-based PCWP ranges and incremental diagnostic yield. Results Adjudicated Group 2 phenotype was present in 37.0% of participants. Resting PCWP demonstrated good discrimination (AUC 0.86), but substantial bidirectional phenotype-hemodynamic discordance persisted across intermediate PCWP ranges. At a resting PCWP of 12 mmHg, 25% of participants classified as pre-capillary had adjudicated Group 2 PH, whereas at 18 mmHg, 35% classified as post-capillary remained discordant non-Group 2. Concordance did not approach 90% until PCWP values were <9 mmHg or >24 mmHg. Dynamic testing incrementally improved concordance within these overlap zones. Nearly half of adjudicated Group 2 PH participants (46.5%) were not identified by resting PCWP alone; incorporation of iNO and provocative testing increased cumulative Group 2 identification by 63.4% and improved sensitivity from 79.9% to 83.7%. Model discrimination improved from an AUC of 0.863 to 0.908 (likelihood-ratio P<0.001). iNO increased PCWP in discordant Pre/G2 participants, unmasking latent left-sided limitation, while lowering PCWP in discordant Post/NonG2 participants, consistent with ventricular interdependence. RV/LV ratio [&ge;]0.94 reduced discordant Post/NonG2 classification by 70.5%, and incorporation of PCWP/cardiac output slope improved physiologic specificity during exercise. Conclusions Group 2 PH is a dynamic, load-dependent phenotype inadequately characterized by resting PCWP alone. Intermediate PCWP values represent continuous probabilities of bidirectional discordance rather than discrete diagnostic states. A staged physiology-guided approach integrating iNO, ventricular geometry, and provocative testing improves concordance between hemodynamic classification and clinically integrated phenotype assignment.

10
The LV-LA Health Score: A Novel Marker of Integrated Myocardial Structure and Function

Estrella, F.; Chiswell, K.; Sun, J.-L.; Duckworth, M.; Vasan, R. S.; Pattison, B.; Provencher, A.; Judd, S. E.; Velagaleti, R.; Douglas, P. S.; Bloomfield, G. S.; Soliman, E.; Chen, Y.-D. I.

2026-06-09 cardiovascular medicine 10.64898/2026.06.08.26353379 medRxiv
Top 0.3%
2.0%
Show abstract

Background Myocardial remodeling precedes symptomatic heart failure, which is important to detect early. We assessed feasibility and clinical correlates of a novel integrated assessment of myocardial remodeling in a large rural cohort in the Southeastern United States. Methods Echoes were obtained with AI assistance (Caption guidance) in 3100 adults in the NHLBI-funded RURAL cohort study. Of those, 1895 had quantifiable global longitudinal strain (GLS), left ventricular mass (LVM), and left atrial volume (LAV). LV-LA Health was based on a simple count of sex-specific abnormalities (0-3), indexed to body surface area (BSA) or height (Table 1). Relationships with demographics and risk factors were compared with Spearman correlation and Mantel-Haenszel tests, with moderate and severe results combined. Results Median (IQR) age was 49 (40-58). Impaired LV-LA Health is common even in a low PREVENT cardiovascular (CV) risk population (median 10-year risk 3.3%; 25th, 75th 1.2,7.2) with preserved ejection fraction (EF; 60%; 57,62). The prevalence of abnormalities differed greatly by indexing method: 18.2% with BSA (15.1% mild; 3.1% mod/severe) vs 51% with height (38.3% mild; 12.7% mod/severe) (Figure 1). LV-LA impairment increased with age, PREVENT CV risk score and cardiovascular risk factors (hypertension, diabetes, dyslipidemia, obesity); all p<0.001. Impairment was more common in Black vs White people (p<0.001) and differed by sex only with height indexation. Conclusions A novel LV-LA health composite of routinely acquired echocardiographic measures identifies substantial subclinical cardiac remodeling in a middle-aged rural community cohort, not detected by PREVENT score or ejection fraction. This is the first application of this framework in a large, unselected community sample. Indexation method affects prevalence, with BSA likely underestimating risk in adiposity-enriched populations. Findings suggest a high rural burden and longitudinal evaluation with future CV events is ongoing.

11
Polygenic risk of cardiovascular disease manifests in cardiac structure and function

Felici, B.; Ritchie, S. C.; Khullar, S.; Foguet, C.; Persyn, E.; Manikpurage, H. D.; Liu, Y.; Lambert, S. A.; Ip, S.; Rudd, J. H. F.; Inouye, M.

2026-06-08 cardiovascular medicine 10.64898/2026.06.07.26354998 medRxiv
Top 0.3%
1.9%
Show abstract

Cardiovascular diseases (CVDs) are highly heritable, but pathogenesis at the organ and physiological level is still poorly defined. Polygenic risk scores (PRSs), which estimate individual genetic susceptibility to a disease, may allow for the identification of associated abnormal organ structures. Ultimately, identifying where cardiovascular polygenic risk manifests can guide early interventions, shape mechanistic hypotheses, and motivate prevention trials for cardiac remodelling. This study investigated the association between PRSs for five common CVDs [heart failure (HF), coronary artery disease (CAD), atrial fibrillation (AF), abdominal aortic aneurysm (AAA) and ischaemic stroke (IS)] and 28 imaging-derived phenotypes (IDPs) from cardiac magnetic resonance imaging of ~62,000 participants in UK Biobank. To investigate the cardiac features associated with elevated polygenic risk of CVDs, we tested CVD PRSs against cardiac IDPs and identified 97 significant associations (FDR [&le;] 0.05). We further identified 32 significant putative mediators between CVD PRSs and incident disease events, revealing that across CVDs, polygenic risk manifested as distinct patterns in cardiac structures. HF implicated all cardiac chambers, including left ventricular and left atrial dysfunction alongside enlarged aorta. AF was characterised by biatrial enlargement and reduced ejection fractions, most prominently in the left atrium but also involving left ventricular wall thickness. IS exhibited left ventricular hypertrophy and left atrial dysfunction, while CAD predominantly involved left ventricular hypertrophy. AAA was primarily characterised by enlarged descending aorta. Overall, cardiac IDPs mediated a substantial proportion of polygenic risk for CVDs, in particular for HF. Taken together, our results show that cardiac structure and function lie on the pathway between polygenic risk and cardiovascular events.

12
Clinician-Centered Evaluation of Large Language Model-Generated Discharge Summaries for Longer Hospitalizations: Insights from Hospitalists and Primary Care Physicians

Osborne, T.; Mahmud, T.; Zheng, X.; Jampala, S.; Abbasi, S.; Hong, S.; Kranz, K.; Lee, S.; Ng, P.; Odekon, K.; Schachter, L.; Sexton, R.; Spinnato, T.; Tharakan, M.; Wu, Z.; Wang, F.; Wong, R.

2026-06-05 health systems and quality improvement 10.64898/2026.06.03.26354858 medRxiv
Top 0.3%
1.7%
Show abstract

Although large language models (LLMs) have shown promise for discharge summary generation, their value may be greater in longer hospitalizations, where increasing documentation volume and complexity increase both clinician burden and the risk of communication failures during transitions of care. Prior evaluations of LLM-generated discharge summaries have largely involved shorter stays and have rarely examined receiving-clinician priorities or incidental finding reporting. We compared LLM-generated and human-authored discharge summaries for 60 Internal Medicine hospitalizations lasting 7 to 21 days, with paired assessment by hospitalists and primary care physicians (PCPs). Clinician reviewers preferred LLM-generated summaries for 95% of encounters and rated them higher for quality, readability, factuality and completeness. PCPs, the primary recipients responsible for post-discharge care, found that LLM-generated summaries were better for understanding and communicating hospital care to patients, and providing follow-up care. LLM-generated summaries had fewer annotated errors, primarily due to fewer omissions, without increased estimated harm potential or likelihood compared with human-authored summaries. Benefits of LLM-generated summaries were especially salient for PCPs, who identified more omissions with greater downstream likelihood of harm than hospitalists. This underscores the importance of designing transition documents around the needs of clinicians assuming care post-discharge. LLM identification of radiology incidental findings was generally accurate and appropriate, suggesting potential to improve follow-up of clinically relevant findings. These findings extend prior work by demonstrating clinical value of LLMs in summarizing longer, complex hospitalizations and highlighting the value of stakeholder-centered design in clinical AI systems. Together, they support supervised LLM-assisted discharge summarization as a tool to reduce cognitive burden, improve documentation quality, and enhance transition-of-care communication.

13
Frozen elephant trunk repair in heritable thoracic aortic disease: Impact of genetic aortopathy on long-term outcomes - A multicenter analysis

Berger, T.; Peterss, S.; Pitts, L.; Kempfert, J.; Nucera, M.; Yildiz, M.; Holubec, T.; Haas, I.; Czerny, M.; Kreibich, M.; Kletzer, J.; Discher, P.; Bialczak, J.; Demal, T. J.; Detter, C.; Gasser, S.; Luehr, M.; Alokhina, A.; Tsagakis, K.; Dohle, D.-S.; Pfeiffer, P.; Radner, C.; Pichlmaier, M.; Goebel, N.; Rylski, B.; Arnold, Z.; Grabenwoeger, M.; Stelzmueller, M.-E.; Dumfarth, J.; Schoenhoff, F. S.; Brickwedel, J.

2026-06-10 cardiovascular medicine 10.64898/2026.06.09.26355316 medRxiv
Top 0.5%
0.9%
Show abstract

Aims This multicenter study aims to compare outcomes of total aortic arch replacement (TAR) using the frozen elephant trunk (FET) technique in patients with and without heritable thoracic aortic disease (HTAD) and to assess whether HTAD influences postprocedural adverse aortic events (AAEs). Methods From 06/2007 to 05/2024, aortic databases from 13 European centers were screened for HTAD patients undergoing TAR with FET. All consecutive dissection and aneurysm non-HTAD patients from the four core centers served as comparator. The primary outcome was AAE, a composite of diameter progression, distal stent graft induced new entry (dSINE), malperfusion, rupture and pseudoaneurysm at 5 years after FET implantation. Results Of 2739 FET patients, 196 (7.2%) were diagnosed with HTAD. The control group consisted of 867 non-HTAD FET patients. Marfan syndrome was the most common condition (72%), followed by Loeys-Dietz syndrome (11%), vascular Ehlers-Danlos syndrome (5.6%) and Turner syndrome (2.0%). Seventeen (8.8%) patients were diagnosed with ns-HTAD. At 5 years 46 (24%) AAEs occurred in the HTAD group, 169 (20%) in the non-HTAD group (p=0.2). Diameter progression was the most common event (10% vs. 12%; p=0.6), followed by dSINE (5.8% vs. 4.5%; p=0.5), malperfusion (4.2% vs. 3.3%; p=0.5), rupture (2.1% vs. 0.7%; p=0.09) and pseudoaneurysm (0.5% vs. 0.2%; p=0.5). Conclusions The FET technique appears safe and effective for acute and chronic aortic disease in HTAD patients, with outcomes comparable to non-HTAD cases and no increase in graft-related complications, challenging traditional concerns about stent graft use in genetically mediated aortic disease.

14
Sensor Geometry, Not Signal Processing, Limits Opportunistic Detection of Capillary-Refill-Like Signals by Rule-Based and Language-Model Methods in Archived ICU Waveforms

Landry, T. C.; Kim, Y.

2026-06-09 intensive care and critical care medicine 10.64898/2026.06.07.26355129 medRxiv
Top 0.6%
0.8%
Show abstract

Background. Capillary refill time is a resuscitation target in septic shock,1-4 but bedside measurement is examiner-dependent. An ICU monitor co-records a photoplethysmogram on the pulse oximeter and intermittent noninvasive blood pressure cuff cycles; if the probe and the cuff share a limb, each cycle is an unplanned vascular occlusion test on the distal microvascular bed. Standard practice places the two on opposite limbs. Objective. To measure how often, in MIMIC-IV-WDB v0.1.0, charted cuff cycles show the photoplethysmographic morphology expected of a same-limb cuff and probe, and to characterize the candidate capillary refill-like signal when that morphology is present. Methods. MIMIC-IV-WDB v0.1.05 was linked to the MIMIC-IV clinical database.6 A pre-registered rule-based detector identified candidate occlusion-reperfusion signatures on the 1-Hz perfusion-index envelope around each charted cuff timestamp. The primary endpoint was the proportion of cuff cycles suitable for analysis that were detector-positive at a 15-second reperfusion threshold, with 95% confidence intervals estimated by resampling patients at a fixed seed. A secondary analysis used a locally hosted multimodal language model (a Gemma-3 derivative on a non-device server) to adjudicate the same signature on perfusion-index plots; no MIMIC-IV-WDB content left the workstation. Results. Of 9,224 charted cuff cycles, 8,909 had a usable pulse-oximeter waveform, and 268 cycles in 15 patients (4.30% of the 6,236 cuff cycles suitable for analysis, 95% CI 2.60 to 6.03) met the primary 15-second threshold. The language model adjudicated the same cycles and called 1,367 of the 8,909 cycles with a usable waveform (15.34%) signature-present, roughly five times the detectors count. Because no laterality ground truth exists, agreement with a single blinded reader served as the comparator rather than accuracy. The two methods were about equally concordant with the reader: precision was 0.25 (95% CI 0.14 to 0.39) for the detector and 0.24 (95% CI 0.10 to 0.35) for the language model, although reweighting to the full population of cycles with a usable waveform lowered the language model to 0.030 (95% CI 0.009 to 0.053). These estimates are reference-limited: a blinded re-read of a 150-card subsample showed only moderate intra-rater reliability (Cohen {kappa} 0.46 to 0.59) with systematic undercalling on the first pass, and rescoring against the corrected re-read roughly doubled precision for both methods. Conclusions. Opportunistic extraction of capillary refill-like signals from archived ICU pulse oximetry is limited in two distinct ways. First, sensor geometry limits how often the signal is recordable: cuff cycles rarely show the morphology expected of a same-limb cuff and probe pair, consistent with opposite-limb placement, so the bottleneck is geometry rather than signal processing. Second, the modest reliability of morphology adjudication limits how well any single flagged cycle can be confirmed: against a blinded reader the detector is a usable screen but a noisy confirmer, the reference is itself only moderately reliable, and the language model is no more concordant despite flagging many more cycles. The minority of cycles in which the morphology appears contain a candidate signal that may merit prospective study under controlled placement with laterality recorded.

15
From Charting Burden to Workflow Signal: Retrospective Validation of Documentation-Density Measures for ICU Complexity and Long-Stay Risk

Collier, A.

2026-06-06 health informatics 10.64898/2026.06.04.26354922 medRxiv
Top 0.6%
0.7%
Show abstract

Background Electronic health record documentation patterns may reflect workflow complexity, monitoring intensity, and operational strain in intensive care settings. However, documentation-derived features can be sensitive to local documentation culture, data capture systems, and outcome definitions. Retrospective validation across multiple datasets is therefore needed before these signals are used in workflow intelligence or clinical AI governance tools. Objective To evaluate whether documentation-density and documentation-timing features show reproducible retrospective signal for ICU workflow complexity and long-stay proxy outcomes across de-identified critical care datasets, while distinguishing workflow and long-stay associations from unsupported claims about mortality prediction, burden reduction, or deployment readiness. Methods We synthesized retrospective validation results from de-identified ICU and workflow datasets generated through a prespecified documentation-density validation program. Feature families included Documentation Burden Score style features, Shift-End Documentation Rate style features, documentation reliability style metadata, and all-documentation feature sets where available. Outcomes included long ICU length of stay proxies, mortality where available, and workflow proxy endpoints. Models compared baseline feature sets with enhanced models containing documentation-density or workflow features. Performance was summarized using area under the receiver operating characteristic curve, Brier score where reported, delta AUROC, bootstrap confidence intervals where reported, and label-shuffle controls where available. Results The strongest external long-stay proxy evidence came from the NWICU chartevents analysis, which included 28,612 ICU stays, 20,267 stays with chart events, and 9,619,759 chart events. For ICU length of stay greater than the median, baseline AUROC was 0.5252. Enhanced AUROC was 0.9512 for Documentation Burden Score features, 0.9214 for Shift-End Documentation Rate features, 0.8470 for documentation reliability style features, and 0.9517 for all documentation features. Corresponding label-shuffle enhanced AUROCs were near random, ranging from 0.4897 to 0.5064. For ICU length of stay greater than the 75th percentile, baseline AUROC was 0.5155. Enhanced AUROC was 0.9433 for Documentation Burden Score features, 0.9194 for Shift-End Documentation Rate features, 0.8118 for documentation reliability style features, and 0.9427 for all documentation features, with label-shuffle enhanced AUROCs from 0.4836 to 0.4999. Additional retrospective support was observed in eICU workflow analyses, HiRID first-24-hour documentation-density analyses, MIMIC-IV HF ICU internal analyses, MIMIC-IV-Note metadata extensions, and nursing-chart or lab density proxy analyses. However, cross-institution discrimination transfer was weak without recalibration, and several analyses remained proxy validations rather than final clinical validations. Conclusions Documentation-density and documentation-timing features show promising retrospective signal for ICU workflow complexity and long-stay proxy outcomes, especially in NWICU chartevents and selected internal dataset-specific analyses. These findings support further preregistered, prospective, silent-mode validation of documentation-derived workflow intelligence. They do not establish prospective clinical performance, mortality reduction, clinician burden reduction, autonomous deterioration prediction, or deployment readiness.

16
The association of Red Cell Distribution Width and Red Cell Distribution Width related indices with the in-Hospital Mortality of Congestive Heart Failure in a retrospective observational cohort study

wang, d.; yuan, x.; Lv, D.; wang, y.

2026-06-04 cardiovascular medicine 10.64898/2026.05.29.26354291 medRxiv
Top 0.6%
0.7%
Show abstract

Background: Red cell distribution width (RDW), a readily available hematological parameter reflecting erythrocyte size heterogeneity, has been increasingly recognized as a prognostic marker in congestive heart failure (CHF), with elevated levels independently associated with adverse outcomes. However, RDW-derived composite indices-particularly the RDW-to-platelet ratio (RPR) and RDW-to-hemoglobin ratio (RHR), which integrate inflammatory, hemostatic, and oxygen-delivery pathways-remain largely unexplored in CHF populations. Whether these indices provide incremental prognostic value beyond RDW alone in critically ill patients with CHF has not been established. Methods: This retrospective cohort study included 30,409 participants from the MIMIC-IV and eICU-CRD databases. Multivariable logistic regression, restricted cubic spline (RCS) analysis, and subgroup analyses were employed to evaluate the associations between RDW, RDW-derived indices (RPR and RHR), and in-hospital mortality in patients with congestive heart failure. Results: Based on a pooled cohort of 30,409 patients with CHF from the MIMIC-IV and multi-center eICU-CRD databases (15,983 and 14,426, respectively), 16,295 (53.6%) were male and 14,114 were female, with a median age of 71.7 years. The mean RDW was 16.0 {+/-} 2.5, and the overall in-hospital mortality rate was 12.6%. Higher RDW quintiles were associated with progressively increased in-hospital mortality. In the fully adjusted model, RDW, RPR, and RHR were all significantly associated with increased in-hospital mortality, with adjusted odds ratios (ORs) of 2.46 (95% CI: 2.17-2.79) for RDW, 1.55 (95% CI: 1.38-1.73) for RPR, and 2.43 (95% CI: 2.09-2.82) for RHR. Sensitivity analyses using restricted cubic splines demonstrated that the association between RDW and RHR with in-hospital mortality was linear (P for nonlinearity > 0.05), whereas that for RPR exhibited a non-linear pattern (P = 0.02 for non-linearity). Conclusions. Elevated RDW, RPR, and RHR were independently associated with increased in-hospital mortality in patients with congestive heart failure. Notably, RPR exhibited a non-linear threshold association with in-hospital mortality.

17
Clinical, Aetiology and Temporal Trends of Hospitalised Heart Failure Patients in a Private Tertiary Hospital in Sierra Leone (2021-2025)

Russell, J. B. W.; Smith, M.; Alhassan, Y.; Coker, J. M.; Tejan, E. A.; Bharat, K.; Meena Kumari, M. K.; Mahdi, O. Z.; Lisk, D. R.

2026-06-08 cardiovascular medicine 10.64898/2026.06.06.26355075 medRxiv
Top 0.6%
0.7%
Show abstract

Abstract Background: Heart Failure is a complex clinical syndrome of growing public health concern in sub-Saharan Africa, yet the data from Sierra Leone are absent. The aim of the study is to characterise the clinical profile, etiological and temporal trends of hospitalised HF patients at Choithrams Memorial Hospital (CMH), Freetown, Sierra Leone, to confirm specific management strategies. Methods: This single-center, retrospective observational cohort study analysed data on HF patients (>18years) admitted at the CMH between January 2021 to 31 December 2025. The clinical definition of HF was based on the Framingham criteria and the European Society of Cardiology (ESC) guidelines , including standard echocardiographic parameters. All variables, including patients demographics, HF. phenotype, aetiology, medical history and hospital outcomes were extracted from the digital record. Non-parameteric tests, multivariable logistic regression to identify variables associated with etiology, Wilcoxon rank-sum test to compare groups and Kruskal-Wallis test to analyse trends over time were utilised. Result: A total of 765 patients were included in the study, with a median age of 53 years (IQR 42-61) and male predominance of 55.3%. Patients with recurrent HF (60.9%) were more common than those with de novo HF (39.1%), were older (54 years vs 53 years), had a higher comorbidity burden (34% vs 4%, p < 0.001), and presented with a cold-wet hemodynamic profile (18.4% vs 8.4%, p < 0.001). HFrEF (61.3%) was the most predominant phenotype, though HFpEF increased with age. Dilated Cardiomyopathy (37.0%), Hypertensive Heart Disease (31.2%) and Valvular Heart Failure (17.1%) were the leading etiologies, while ischemic heart disease (6.3%) was relatively uncommon. A majority of the patients were referred (77.9%), and 50.8% presented with NYHA IV. The strongest independent predictor for HF was hypertensive heart disease [AOR = 17.81; C.I 95%: (3.13-48.76), p <0.001]. An analysis of the trends in etiologies and demographics over the five-year period demonstrated no significant changes (all p-values > 0.05 for age, sex, aetiology, and most comorbidities). Conclusion: HF affects the younger adult population in Sierra Leone and is mainly caused by DCM and HHD. The late case presentations, the high prevalence of recurrent HF, and the associated high burden of comorbidities emphasize an urgent need to develop and implement improved strategies for the prevention, early detection, and long-term management of HF within Sierra Leone's healthcare system.

18
Registered Report: Artifact Index for Capacitive Electrocardiography Acquired with an Armchair

Warnecke, J. M.; Baumgärtel, D.; Bollmann, J.; Deserno, T. M.

2026-06-09 health informatics 10.64898/2026.06.03.26353526 medRxiv
Top 0.8%
0.4%
Show abstract

Background Continuous health monitoring enables early detection of diseases and improves therapeutic outcomes. Non-intrusive biosignal sensors, such as capacitive ECG (cECG), offer a practical solution for daily monitoring in private environments, such as smart homes and vehicles. However, artifacts reduce signal quality and compromise reliability. Methods Following a registered report protocol (Warnecke JM et al. Plos One. 2021; 16(7):e0254780), we record data of 44 subjects and develop an artifact index for cECG. We use three signal quality indices (SQIs): the correlation of QRS complexes (corSQI), the R-peak detection consistency (bSQI) and the absolute amplitude ratio (aSQI). Our index classifies overlapping 10s segments with a step-width of 2s into clean or artifact segments. We label a 2s interval as artifacts if all five overlapping segments indicate artifacts. We record cECGs using an armchair with integrated electrodes in a single-arm study involving 44 subjects performing two activities -- reading and watching television (TV); for 11 minutes each. We record a time-synchronized reference ECG with skin electrodes on the chest. To evaluate the artifact index, we compare it with manually generated ground truth. Moreover, we evaluate the clothing materials cotton, linen, jeans, and polyester in 5 subjects. Results Watching TV results in longer, continuously clean signal durations than reading. On average, 88.3% of the signal has a minimum continuous clean duration of 10s, versus 79.8% during reading. All clothing configurations achieve a clean signal duration exceeding 10s. Among the SQI metrics, bSQI performs best, achieving an accuracy of 90.7% and an F1 score of 79.9%. Combining the three SQI metrics in a voting approach improves accuracy to 92.0% and F1 score to 82.1%. Discussion Our artifact index automatically distinguishes clean from artifact cECG segments, promoting health monitoring in unsupervised real-world settings, earlier disease detection, and preventive health management. A limitation is the investigation of only two scenarios (reading and watching TV).

19
Cardiovascular-Kidney-Metabolic Syndrome Among US Adults, 1999-2023: National Trends and Projections Through 2050

Fu, F.; Wei, A.; Wang, G.; Fang, S.; Chen, J.; Liu, W.; Liu, H.; Gao, X.; Lei, Y.; Guo, N.; Chen, M.; Yu, J.; Wang, Y.; Li, S.; Mao, Y.; Yan, L.

2026-06-10 health systems and quality improvement 10.64898/2026.06.08.26355220 medRxiv
Top 0.9%
0.4%
Show abstract

Background Cardiovascular-kidney-metabolic (CKM) syndrome integrates adiposity, metabolic risk, kidney dysfunction, and cardiovascular disease in a prevention-oriented framework. National estimates across 1999-2023 NHANES and future burden remain limited. Methods We analyzed US adults aged 20 years from 11 NHANES cycles, 1999-2000 through August 2021-August 2023. CKM stage 0-4 was assigned using harmonized examination, laboratory, medication, and questionnaire data. Prevalence was survey-weighted and standardized to the 2010 US Census adult population. Decade trends used survey-weighted logistic regression adjusted for age, sex, and race and ethnicity. Exploratory 2040 and 2050 projections combined NHANES prevalence models with US Census projections under population-aging-only, trend-continuation, and risk-improvement scenarios. Results Among 62,890 eligible adults, 62,888 had sufficient CKM data. In 2021-2023, age-standardized prevalence was 87.9% (95% CI, 86.5%-89.4%) for CKM stage 1 and 62.0% (95% CI, 60.1%-63.8%) for stages 2-4. Stage 2 accounted for 50.1% (95% CI, 48.2%-51.9%) and stages 3-4 for 11.9% (95% CI, 11.0%-12.7%). From 1999-2000 to 2021-2023, any CKM increased by 4.6 percentage points (95% CI, 2.4 to 6.9; P<0.001), whereas stages 2-4 changed by 2.1 percentage points (95% CI, 5.1 to 0.8; P=0.156). In adjusted decade models, any CKM increased (OR, 1.28; 95% CI, 1.19-1.38; P<0.001), while stages 2-4 showed no significant linear trend (OR, 0.95; 95% CI, 0.89-1.01; P=0.084). Excess adiposity and diabetes increased, dyslipidemia declined, and hypertension, chronic kidney disease, and clinical cardiovascular disease were stable. With population aging alone, projected stages 2-4 burden rose from 164.8 million adults in 2023 to 193.7 million in 2050; under risk improvement, it was 147.7 million. Conclusions CKM syndrome remained highly prevalent among US adults. Although later stages did not increase significantly, population aging may expand the absolute care burden unless broad risk improvement occurs.

20
Beyond Injection Detection: A Positive-Security Prompt Firewall that Closes the Scope and PHI Gap SOTA Classifiers Miss in Healthcare

Schwoebel, J.; Semenec, I.; Rousseva, J.; Frasch, M. G.; Thorstenson, R.; Bhatt, M.

2026-06-06 health systems and quality improvement 10.64898/2026.06.04.26354950 medRxiv
Top 0.9%
0.4%
Show abstract

Large language models embedded in autonomous agents process trusted instructions and untrusted data in one context window, leaving them open to direct and indirect prompt injection. In healthcare this is not hypothetical: a 2025 JAMA Network Open study found commercial medical LLMs followed injected instructions in 94.4% of simulated patient encounters, including life threatening recommendations . Yet the clinically decisive problem we quantify here is different. Most real clinical threats protected health information PHI exfiltration, cross patient access, bulk export, out of scope advice are fluent, legitimate looking requests that carry no attack signal, so even a state of the art injection detector passes them. Existing runtime guardrails trade safety against latency: model based auditors are accurate but add hundreds of milliseconds of Python inference, while lexical filters are fast but blind to obfuscated or semantically disguised payloads. We present QFIRE, an inline, provider agnostic prompt firewall implemented as a single self contained Rust toolchain proxy, CLI, and benchmark harness. QFIRE combines three mechanisms: (i) positive security scope constraints, which restrict a model call to a declared natural language purpose and block out of scope drift even when no overt attack token is present; (ii) an asynchronous detector graph that runs N rules and their detector nodes concurrently, cheapest checks first; and (iii) a de obfuscation pass that decodes Base64 hex ROT13, folds homoglyphs and leetspeak, and strips zero width characters before detection. QFIRE ships 106 versioned firewall rules and a dedicated HIPAA Safe Harbor 18 identifier PHI panel, and runs a local DeBERTa v3 injection classifier via embedded ONNX Runtime. On 1968 public prompt injection and jailbreak prompts QFIREs deterministic hybrid attains F1 0.86, statistically tied with Metas state of the art PromptGuard 2 0.86 and above protectai DeBERTa v3 0.83; lexical baselines lag 0.16 to 0.50. Our central result is on QFIRE HealthBench, a new 2000 prompt healthcare benchmark we build and release with real garak and Microsoft PyRIT payloads. There the same PromptGuard-2 recovers only 0.40 recall DeBERTa v3 0.57, because most clinical threats carry no injection signal; QFIREs combined scope plus PHI chain reaches 0.83 recall F1 0.87 at a calibrated 0.08 false positive rate. Generic injection detection, even state of the art, is therefore necessary but not sufficient for healthcare agents. A bare LLM judge also closes most of this static corpus gap F1 0.90; QFIREs contribution beyond static accuracy is auditable determinism, bounded latency, and adaptive robustness, where the bare judge falls to 34 to 59% recall section 5.5. End to end, placing QFIRE in front of a tool using agent over a mock EHR sandbox cuts the agents harmful action rate from 0.38 to 0.00 at a 0.13 benign utility cost. All code, rules, corpora snapshots, and scripts are released, and every table regenerates from a single make paper target against local models with no paid API keys.